dialog system
Knowledge Augmented Finetuning Matters in both RAG and Agent Based Dialog Systems
Cai, Yucheng, Wu, Yuxuan, Huang, Yi, Feng, Junlan, Ou, Zhijian
Large language models (LLMs) have recently been applied to dialog systems. Despite making progress, LLMs are prone to errors in knowledge-intensive scenarios. Recently, approaches based on retrieval augmented generation (RAG) and agent have emerged to improve the factual accuracy by enhancing the LLMs with knowledge retrieved from external knowledge bases (KBs). This is mostly implemented by prompting the LLMs with instructions, examples and the retrieved knowledge. However, LLMs may have difficulty using the retrieved knowledge effectively for response generation, because they are not well trained to do such generation for specific domains. To mitigate this problem, we propose to finetune the LLMs in the RAG-based and agent-based systems with domain-specific data, together with domain-specific external knowledge, which is called knowledge augmented finetuning (KAFT). We base our study on the MobileCS2 dataset, a real-life customer service dialog dataset that features intensive knowledge interactions, to systematically compare the prompting and KAFT techniques in the RAG-based and agent-based systems. Experiment results show that KAFT substantially surpasses prompting in both RAG and agent systems, particularly in terms of factual accuracy. To the best of our knowledge, this paper represents the first solid empirical work to investigate the KAFT idea.
Entriever: Energy-based Retriever for Knowledge-Grounded Dialog Systems
Cai, Yucheng, Li, Ke, Huang, Yi, Feng, Junlan, Ou, Zhijian
A retriever, which retrieves relevant knowledge pieces from a knowledge base given a context, is an important component in many natural language processing (NLP) tasks. Retrievers have been introduced in knowledge-grounded dialog systems to improve knowledge acquisition. In knowledge-grounded dialog systems, when conditioning on a given context, there may be multiple relevant and correlated knowledge pieces. However, knowledge pieces are usually assumed to be conditionally independent in current retriever models. To address this issue, we propose Entriever, an energy-based retriever. Entriever directly models the candidate retrieval results as a whole instead of modeling the knowledge pieces separately, with the relevance score defined by an energy function. We explore various architectures of energy functions and different training methods for Entriever, and show that Entriever substantially outperforms the strong cross-encoder baseline in knowledge retrieval tasks. Furthermore, we show that in semi-supervised training of knowledge-grounded dialog systems, Entriever enables effective scoring of retrieved knowledge pieces and significantly improves end-to-end performance of dialog systems.
Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT
Tayal, Anuja, Salunke, Devika, Di Eugenio, Barbara, Allen-Meares, Paula, Abril, Eulalia Puig, Garcia, Olga, Dickens, Carolyn, Boyd, Andrew
Conversational assistants are becoming more and more popular, including in healthcare, partly because of the availability and capabilities of Large Language Models. There is a need for controlled, probing evaluations with real stakeholders which can highlight advantages and disadvantages of more traditional architectures and those based on generative AI. We present a within-group user study to compare two versions of a conversational assistant that allows heart failure patients to ask about salt content in food. One version of the system was developed in-house with a neurosymbolic architecture, and one is based on ChatGPT. The evaluation shows that the in-house system is more accurate, completes more tasks and is less verbose than the one based on ChatGPT; on the other hand, the one based on ChatGPT makes fewer speech errors and requires fewer clarifications to complete the task. Patients show no preference for one over the other.
- Asia > Middle East > Republic of Türkiye (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Alabama (0.04)
- (10 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Improving robot understanding using conversational AI: demonstration and feasibility study
Explanations constitute an important aspect of successful human robot interactions and can enhance robot understanding. To improve the understanding of the robot, we have developed four levels of explanation (LOE) based on two questions: what needs to be explained, and why the robot has made a particular decision. The understandable robot requires a communicative action when there is disparity between the human s mental model of the robot and the robots state of mind. This communicative action was generated by utilizing a conversational AI platform to generate explanations. An adaptive dialog was implemented for transition from one LOE to another. Here, we demonstrate the adaptive dialog in a collaborative task with errors and provide results of a feasibility study with users.
Developing Enhanced Conversational Agents for Social Virtual Worlds
Griol, D., Sanchis, A., Molina, J. M., Callejas, Z.
In this paper, we present a methodology for the development of embodied conversational agents for social virtual worlds. The agents provide multimodal communication with their users in which speech interaction is included. Our proposal combines different techniques related to Artificial Intelligence, Natural Language Processing, Affective Computing, and User Modeling. Firstly, the developed conversational agents. A statistical methodology has been developed to model the system conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. In addition, the selection of the next system response is adapted considering information stored into users profiles and also the emotional contents detected in the users utterances. Our proposal has been evaluated with the successful development of an embodied conversational agent which has been placed in the Second Life social virtual world. The avatar includes the different models and interacts with the users who inhabit the virtual world in order to provide academic information. The experimental results show that the agents conversational behavior adapts successfully to the specific characteristics of users interacting in such environments.
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
"Dialogue" vs "Dialog" in NLP and AI research: Statistics from a Confused Discourse
Within computing research, there are two spellings for an increasingly important term - dialogue and dialog. We analyze thousands of research papers to understand this "dialog(ue) debacle". Among publications in top venues that use "dialog(ue)" in the title or abstract, 72% use "dialogue", 24% use "dialog", and 5% use both in the same title and abstract. This split distribution is more common in Computing than any other academic discipline. We investigate trends over ~20 years of NLP/AI research, not finding clear evidence of a shift over time. Author nationality is weakly correlated with spelling choice, but far from explains the mixed use. Many prolific authors publish papers with both spellings. We use several methods (such as syntactic parses and LM embeddings) to study how dialog(ue) context influences spelling, finding limited influence. Combining these results together, we discuss different theories that might explain the dialog(ue) divergence.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > New York (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- Asia > China (0.04)
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.68)
Task-Oriented Dialog Systems for the Senegalese Wolof Language
Mbaye, Derguene, Diallo, Moussa
In recent years, we are seeing considerable interest in conversational agents with the rise of large language models (LLMs). Although they offer considerable advantages, LLMs also present significant risks, such as hallucination, which hinder their widespread deployment in industry. Moreover, low-resource languages such as African ones are still underrepresented in these systems limiting their performance in these languages. In this paper, we illustrate a more classical approach based on modular architectures of Task-oriented Dialog Systems (ToDS) offering better control over outputs. We propose a chatbot generation engine based on the Rasa framework and a robust methodology for projecting annotations onto the Wolof language using an in-house machine translation system. After evaluating a generated chatbot trained on the Amazon Massive dataset, our Wolof Intent Classifier performs similarly to the one obtained for French, which is a resource-rich language. We also show that this approach is extensible to other low-resource languages, thanks to the intent classifier's language-agnostic pipeline, simplifying the design of chatbots in these languages.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
- (14 more...)
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Zink, Oswald, Higuchi, Yosuke, Mullov, Carlos, Waibel, Alexander, Kobayashi, Tetsunori
Effective spoken dialog systems should facilitate natural interactions with quick and rhythmic timing, mirroring human communication patterns. To reduce response times, previous efforts have focused on minimizing the latency in automatic speech recognition (ASR) to optimize system efficiency. However, this approach requires waiting for ASR to complete processing until a speaker has finished speaking, which limits the time available for natural language processing (NLP) to formulate accurate responses. As humans, we continuously anticipate and prepare responses even while the other party is still speaking. This allows us to respond appropriately without missing the optimal time to speak. In this work, as a pioneering study toward a conversational system that simulates such human anticipatory behavior, we aim to realize a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance (EOU), using the middle portion of an utterance. To achieve this, we propose a training strategy for an encoder-decoder-based ASR system, which involves masking future segments of an utterance and prompting the decoder to predict the words in the masked audio. Additionally, we develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information to accurately detect the EOU. The experimental results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU. Moreover, the proposed training strategy exhibits general improvements in ASR performance.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Investigating the effect of Mental Models in User Interaction with an Adaptive Dialog Agent
Vanderlyn, Lindsey, Väth, Dirk, Vu, Ngoc Thang
Mental models play an important role in whether user interaction with intelligent systems, such as dialog systems is successful or not. Adaptive dialog systems present the opportunity to align a dialog agent's behavior with heterogeneous user expectations. However, there has been little research into what mental models users form when interacting with a task-oriented dialog system, how these models affect users' interactions, or what role system adaptation can play in this process, making it challenging to avoid damage to human-AI partnership. In this work, we collect a new publicly available dataset for exploring user mental models about information seeking dialog systems. We demonstrate that users have a variety of conflicting mental models about such systems, the validity of which directly impacts the success of their interactions and perceived usability of system. Furthermore, we show that adapting a dialog agent's behavior to better align with users' mental models, even when done implicitly, can improve perceived usability, dialog efficiency, and success. To this end, we argue that implicit adaptation can be a valid strategy for task-oriented dialog systems, so long as developers first have a solid understanding of users' mental models.
- North America > United States > New York > New York County > New York City (0.05)
- Europe > France (0.04)
- Oceania > Australia (0.04)
- (11 more...)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.68)
- Education (0.68)
- Health & Medicine (0.47)
- Consumer Products & Services > Travel (0.46)
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Li, Yinghao Aaron, Jiang, Xilin, Darefsky, Jordan, Zhu, Ge, Mesgarani, Nima
The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resources required. The conventional approach of cascading automatic speech recognition (ASR), LLM, and text-to-speech (TTS) models in a pipeline, while effective, suffers from unnatural prosody because it lacks direct interactions between the input audio and its transcribed text and the output audio. These systems are also limited by their inherent latency from the ASR process for real-time applications. This paper introduces Style-Talker, an innovative framework that fine-tunes an audio LLM alongside a style-based TTS model for fast spoken dialog generation. Style-Talker takes user input audio and uses transcribed chat history and speech styles to generate both the speaking style and text for the response. Subsequently, the TTS model synthesizes the speech, which is then played back to the user. While the response speech is being played, the input speech undergoes ASR processing to extract the transcription and speaking style, serving as the context for the ensuing dialogue turn. This novel pipeline accelerates the traditional cascade ASR-LLM-TTS systems while integrating rich paralinguistic information from input speech. Our experimental results show that Style-Talker significantly outperforms the conventional cascade and speech-to-speech baselines in terms of both dialogue naturalness and coherence while being more than 50% faster.
- North America > United States > New York > Monroe County > Rochester (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
- Asia > Middle East > Jordan (0.04)